-
Notifications
You must be signed in to change notification settings - Fork 0
KAFKA-19163: Avoid deleting groups with pending transactional offsets #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
KAFKA-19163: Avoid deleting groups with pending transactional offsets #7
Conversation
When deleting pending transactional offsets, we must preserve the list of groups associated with the producer ID, otherwise we cannot clean up the list of pending transactions for the group once the transaction is committed or aborted.
When a group has pending transactional offsets but no committed offsets, we can accidentally delete it while cleaning up expired offsets. Add a check to avoid this case.
Marking this as a draft until apache#19495 is completed, since there is a merge conflict. |
WalkthroughThe changes introduce a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant OffsetMetadataManager
participant Offsets
participant PendingTransactionalOffsets
Client->>OffsetMetadataManager: Commit Offset (possibly transactional)
alt Regular Commit
OffsetMetadataManager->>Offsets: Store offset (preserveGroups = false)
else Transactional Commit
OffsetMetadataManager->>PendingTransactionalOffsets: Store offset (preserveGroups = true)
end
Client->>OffsetMetadataManager: Cleanup Expired Offsets
OffsetMetadataManager->>Offsets: Remove expired offsets
OffsetMetadataManager->>PendingTransactionalOffsets: Check for open transactions
alt No open transactions
OffsetMetadataManager->>Offsets: Remove group if empty
else Open transactions exist
OffsetMetadataManager->>Offsets: Preserve group entry
end
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/OffsetMetadataManager.java (3)
266-268
: Consider opportunistic cleanup of empty groups once transactions resolveWith
preserveGroups == true
we intentionally keep an empty group bucket while a transaction is open.
However, once the transaction is completed (commit/abort) we drop the entireOffsets
instance frompendingTransactionalOffsets
, meaning the additional emptygroupId → {}
entry will be garbage‑collected.
In the corner‑case where the last partition is tomb‑stoned before the transaction ends, the empty map remains alive until the TX finishes – that’s fine, but you could free it immediately by:- if (!preserveGroups && topicOffsets.isEmpty()) + if (( !preserveGroups || offsetsByGroup.get(groupId).isEmpty()) + && topicOffsets.isEmpty()) offsetsByGroup.remove(groupId);Not critical (bounded memory) but can shorten the retention of now‑useless objects.
858-862
: Early‑return now guards against open transactions – good, but update the javadocChanging
if (offsetsByTopic == null) { return !openTransactionsByGroup.containsKey(groupId); }prevents accidental group deletion while
openTransactionsByGroup
still tracks a producer.
Please update the method‑level javadoc (lines 850‑856) to mention thatfalse
can also be returned when no committed offsets exist but pending transactions are present.
894-896
: Behavioural change is subtle – add metrics or TRACE log?The final return value now short‑circuits on
!openTransactionsByGroup.containsKey(groupId)
.
For operators it might be useful to emit a TRACE when cleanup is aborted because of pending transactions, e.g.:if (!openTransactionsByGroup.containsKey(groupId) ) { log.trace("[GroupId {}] Skipping deletion – open transactional offsets still present", groupId); }Helps to diagnose “why does this empty group stick around?” questions.
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/OffsetMetadataManagerTest.java (1)
2658-2691
: Edge‑case test for “pending‑only” group is valuable, consider clarifying commentThe new
testCleanupExpiredOffsetsWithPendingTransactionalOffsetsOnly()
verifies that even when all committed offsets have expired, the group persists while it still owns only pending transactional offsets.Tiny nit: the inline comment has a typo – beacuse → because.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/OffsetMetadataManager.java
(5 hunks)group-coordinator/src/test/java/org/apache/kafka/coordinator/group/OffsetMetadataManagerTest.java
(1 hunks)
🔇 Additional comments (3)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/OffsetMetadataManager.java (2)
200-207
: Scope guard for groups with pending transactions is clear and self‑documentingThe introduction of
preserveGroups
explicitly documents why empty group entries must be kept around while a transaction is in flight. The in‑line Javadoc explains the intent, which is often overlooked in coordinator code – nice touch!
1005-1006
: Correctly propagates the ‘preserveGroups’ flag to transactional storeUsing
new Offsets(true)
inside the transactional path ensures the coordinator remembers the group even after all pending partitions are removed. This aligns with the cleanup logic introduced above.group-coordinator/src/test/java/org/apache/kafka/coordinator/group/OffsetMetadataManagerTest.java (1)
2596-2655
: Test covers delete‑then‑expire regression – great addition
testCleanupExpiredOffsetsWithDeletedPendingTransactionalOffsets()
faithfully reproduces the scenario fixed by the patch:
- regular commit
- transactional commit (pending)
- tomb‑stone the transactional offset
- ensure group is not deleted until the TX is completed
The asserts match the intended behaviour and the use of
MockTime
keeps the test deterministic.
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a potential issue where groups with pending transactional offsets could be accidentally deleted during expired offset cleanup. The changes introduce a check to prevent this scenario and include new tests to verify the fix. Overall, the changes seem well-structured and address the problem effectively.
Summary of Findings
- Missing Preservation of Groups: The initial
Offsets
class did not account for preserving groups with pending transactions. The addition of thepreserveGroups
flag and its usage in theremove
method ensures that groups associated with pending transactions are not prematurely deleted. - Cleanup Logic: The cleanup logic in
cleanupExpiredOffsets
has been modified to check for open transactions before allowing deletion, preventing accidental removal of groups with pending transactional offsets. - Test Coverage: New tests have been added to cover scenarios with deleted and pending transactional offsets, ensuring that the cleanup process behaves correctly under various conditions.
Merge Readiness
The pull request addresses an important correctness issue and includes thorough testing. I recommend addressing the high severity issue before merging. I am unable to approve this pull request, and users should have others review and approve this code before merging.
When a group has pending transactional offsets but no committed offsets,
we can accidentally delete it while cleaning up expired offsets. Add a
check to avoid this case.
Summary by CodeRabbit